Skip to content

chore(spanner): use channel affinity#13231

Open
rahul2393 wants to merge 8 commits into
mainfrom
use-channel-affinity
Open

chore(spanner): use channel affinity#13231
rahul2393 wants to merge 8 commits into
mainfrom
use-channel-affinity

Conversation

@rahul2393
Copy link
Copy Markdown
Contributor

Internal reference: go/grpc-gcp-fixes#bookmark=id.q4x3oa8l672

@rahul2393 rahul2393 requested review from a team as code owners May 20, 2026 04:46
}

public static TracingFramework getActiveTracingFramework() {
synchronized (lock) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this change is unrelated, but since it was small perf change I included it here, this was on critical request path being shown up in benchmark mutex profile

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the use of logical affinity keys and explicit unbinding with direct channel ID affinity using AtomicReference for Spanner operations. This change simplifies the integration with grpc-gcp by removing the need for affinity-key map management and the associated cleanup logic. Additionally, SpannerOptions was updated to use a volatile variable for the active tracing framework, removing unnecessary synchronization. I have no feedback to provide.

@rahul2393
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a ChannelAffinityRef mechanism to GcpManagedChannel to facilitate sticky channel routing without the need for an internal affinity-key map. This new approach is integrated into the Spanner client, replacing the previous CHANNEL_HINT and UNBIND_CHANNEL_HINT logic, and allows for more direct channel management during retries. The changes also include the removal of the GrpcGcpAffinityUtil class, the deprecation of explicit affinity cleanup methods in SpannerRpc, and updates to SpannerOptions for more efficient tracing framework access. Regarding the feedback, a potential NullPointerException was identified in GcpManagedChannel.newCall when no channels are available; it is recommended to handle this case by returning a NoopGcpClientCall.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the CHANNEL_HINT and UNBIND_CHANNEL_HINT mechanism with a new ChannelAffinityRef system for managing sticky channel routing in gRPC-GCP. This change eliminates the need for gRPC-GCP to maintain internal affinity-key maps for Spanner operations. Additionally, SpannerOptions was optimized by replacing synchronized access to the tracing framework with a volatile field. Review feedback identified a potential logic error in the channel selection loop where the 'use different channel' flag could be lost during retries, and recommended adding activity checks when picking fallback channels.

@rahul2393
Copy link
Copy Markdown
Contributor Author

Screenshot 2026-05-20 at 7 11 55 PM

Prober running stale_read at 400QPS on DirectPath shows 200us save

@rahul2393 rahul2393 force-pushed the use-channel-affinity branch from 1f07a11 to 36329d8 Compare May 20, 2026 14:27
@rahul2393
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new ChannelAffinityRef mechanism in GcpManagedChannel to replace the previous hint-based affinity system, simplifying channel lifecycle management for Cloud Spanner. The changes remove the need for explicit affinity unbinding and the GrpcGcpAffinityUtil class. Additionally, SpannerOptions was updated to use a volatile field for the tracing framework to improve concurrency. A critical review comment points out that getChannelRefByAffinityRef could return null, potentially causing a NullPointerException in newCall, and suggests throwing an exception instead to maintain invariants.

@rahul2393 rahul2393 requested a review from olavloite May 20, 2026 14:55
@rahul2393
Copy link
Copy Markdown
Contributor Author

image

YCSB run with 50 threads on database with 10GB data shows improvement of around 100us for 30mins run

*/
protected ChannelRef getChannelRefByAffinityRef(ChannelAffinityRef affinityRef) {
maybeDynamicUpscale();
while (true) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add a clarifying comment for why we are looping here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

}
ByteString id = getTransactionId();
if (id != null && !id.isEmpty()) {
rpc.clearTransactionAndChannelAffinity(id, Option.CHANNEL_HINT.getLong(channelHint));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This not only cleaned up channel affinity, but also transaction affinity for location-aware routing, right? I think that we still need the latter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's intended.
For Omni, irrespective of whether it's a strong/stale single/multi-use read-only, each RPC can land to the server based on the routing hint, so pinning to the same endpoint for read-only transactions is not needed.

Only read-write transactions need pinning to the same endpoint, and since there is only one gRPC channel per endpoint, there is no need to maintain affinities, so we can safely remove it.

public void clearTransactionAndChannelAffinity(
ByteString transactionId, @Nullable Long channelHint) {
if (keyAwareChannel != null) {
keyAwareChannel.clearTransactionAndChannelAffinity(transactionId, channelHint);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that we do not need this call anymore? (See also my other comment on AbstractReadContext)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended


void clearTransactionAndChannelAffinity(ByteString transactionId, @Nullable Long channelHint) {
String address = transactionAffinities.asMap().remove(transactionId);
readOnlyTxPreferLeader.invalidate(transactionId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also no longer needed? (I mean specifically the readOnlyTxPreferLeader.invalidate call)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes not needed.

public static final CallOptions.Key<Integer> CHANNEL_ID_KEY =
CallOptions.Key.create("GcpChannelId");

/** CallOptions key for sticky channel routing without affinity-key map state. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, there are quite big changes to grpc-gcp being made in this pull request, but there are no tests that verify these changes. Can we add tests that cover the changes that we make to grpc-gcp here? Relying on tests in the Spanner client is not enough, as this is a standalone library that can be used by other clients.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage for the changes to the Spanner client is also quite thin, but the existing tests generally do cover these changes. One interesting test (if possible) would be a test that really verifies that all requests in a single read/write or multi-use read-only transaction really all use the same gRPC channel (so basically checking the local port where the requests are coming from on the mock server).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more tests

@rahul2393 rahul2393 requested a review from olavloite May 21, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants